STOCHASTIC MONGE-KANTOROVICH PROBLEM AND ITS DUALITY* 



XICHENG ZHANG 

Abstract. In this article we prove the existence of a stochastic optimal transference plan for a 
stochastic Monge-Kantorovich problem by measurable selection theorem. A stochastic version 
of Kantorovich duality and the characterization of stochastic optimal transference plan are also 
established. Moreover, Wasserstein distance between two probability kernels are discussed too. 



1 . Introduction and Main Results 

Let X be a Polish space and !P(X) the total of probability measures on (X, ^(X)), where 
e^(X) is the Borel cr-field. It is well known that f (X) is a Polish space with respect to the weak 
convergence topology. Let ^{V^K)) be the associated Borel cr-field. Let Y be another Polish 
space and c : X x Y — > [0, oo] be a lower semicontinuous function called cost function. For 
jj. e ^(X) and v e ^(Y), consider the classical Monge-Kantorovich problem 



f 

' JX). 



mc,)U,y):= inf ( c{x,y)n{dx,Ay\ (1) 

where Tiip., v) denotes the set of all joint probability measures on X x Y with marginal distribu- 
tions /i and V. The history and the background of Monge-Kantorovich problem are refereed to 
BHISl etc. The element in n(//, v) is called transference plan; those achieving the infimum are 
called optimal transference plan. We remark that the existence of optimal transference plan is 
easily obtained by the compactness of n(jU, v) in !P(XxY). Moreover, the following Kantorovich 
duality formula holds (cf. LIJ or [6^ Theorem 5.10]) 

C^'''\c,^i,v)= sup i\ cp{y)v{dy)- \ lP{x)^i{^ix)\. (2) 

We now turn to the description of stochastic versions of Monge-Kantorovich problem and its 
duality. Let (Q, P) be a probability space and jj. a probability kernel from Q. to X. Here, by 
a probability kernel jj. from Q. to X, we mean that a mapping /u : Q.X ^(X) [0, 1] satisfies 

(i) for each co e Q, ^ ^(X); (ii) for each B e ^(X), co i-^ ^oj(B) is .^-measurable. 

Let Y be another Polish space and v a probability kernel from Q to Y. Let c : QxXx Y — > [0, oo] 
be a measurable function called stochastic cost function. Consider the following stochastic 
Monge-Kantorovich problem: 

a'°'\c,fx,v):= inf eT c(co,x,y)n^{dx,dy), (3) 

where "KQ^, v) is the set of all probability kernels from Q to X x Y with marginal probability 
kernels fj. and v, i.e., for a.n^ e v), 

n^i-, Y) = fi^, n^(X, •) = v^. 



* This work is supported by NSFs of China (Nos. 10971076; 10871215). 
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If 7r°P' e 'Kip., v) attains the infimum for the minimization problem (O, we call it a stochastic 
optimal transference plan. Unlike the deterministic problem ([T]), it seems to be hard to prove the 
existence of a stochastic optimal transference plan by a direct compactness argument. In fact, 
when the cost function c is deterministic, the existence of n^^^ has been obtained by Zhang 
(see also [6, Corollary 5.22]). On the other hand, one may also expect the following stochastic 
Kantorovich duality formula holds: 



sup E I (p{io,y)v^{diy) - \ (A(<y, x)//^(dx) , (4) 

where L^ipuj x P) denotes the set of all measurable functions ijj with E ^ x)\iJ.aj{d.x) < +oo, 
and (j) -ifj ^ c means that y) - ipiw, x) < c{a), x, y) for all oj, x, y. 

Our first result is about the existence of stochastic optimal transference plans. 

Theorem 1.1. Assume that for each oj, {x,y) i-> c{co,x,y) is continuous, and for each {x,y) 6 
X X Y, (x» i-> c(a), X, y) is ^ -measurable and satisfies 



JxxY 



c(aj,x,j)/z^(dx)v^(dy) < +00. (5) 
Then there exists a stochastic optimal transference plan ;r°P' e 'Kip., v) such that 



C^^'Hc p,v) = E doj, X, y)nTidx, dy) < +c«. (6) 

JXxY 

Moreover, o) ^ C'^'^''^''(c(a>),//^, v^J is ^ -measurable and we have 

C'°'\c,fi,v) = E{ inf r cico,x,y)nidx,dy)\ = E(c'''''\cico),p^,v^)). (7) 

Remark 1.2. For fixed a; e Q, letX^ c n(/i(^, v^) be the set of all optimal transference plans for 
deterministic problem ([7]). It is well known that is a nonempty compact subset ofP^K x Y). 
For proving Theorem li.il we have to carefully choose a measurable function co :7r°''' so that 
for each oj, tt^'' e This seems not to be trivial as shown in [7 |. 

Our second result is about the stochastic Kantorovich duality. 

Theorem 1.3. Keeping the same assumptions as in Theorem \l.l\ we further have 

C''''\c,p,v) = sup e( r (Pico,y)v^idy) - [ iffico, x)p^idx)] 

sup E l (l)i(jo,y)Vcaidy) - I ipioo, x)p^idx)\, (8) 

(iA,0)eL!>^(X)xLi>^(Y);0-,A<f \-'y Jx / 

where Lip'^i'K) is the space of all bounded measurable functions i/zioj, x) on Q x X which is 
Lipschitz continuous in xfor each co, similarly for L/p^(Y). 

Our third result is about the characterization of stochastic optimal transference plan, which 
corresponds to [i6i Theorem 5.10 (ii)] (see also [HI El). 

Theorem 1.4. In the situation of Theorem li.il for any n e 'Kip, v), the following statements 
are equivalent: 

(a) n is a stochastic optimal transference plan; 

(b) for almost all a> e Q., the support ofn^) is a cico)-cyclically monotone set; 
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(c) there exist a pair of measurable functions (0, ij/) on Q.xY and Q x X such that 



(p{cL), y) - il/{co, x) < c{(x), X, y), V(a;, x, e Q x X x Y, 

and for each e Q, il/{oS) is c(co)-convex and 

:= {(x,y) : (f>ia),y) - ipioj^x) = c(co,x,y)} c 5,.iAM 

has Kco-full measure, where dci//(co) denotes the c(a))-subdifferential ofif/(a), •). 

Moreover, the measurable setY := {(a),x,y) : (,x,y) e T,^} defined from (c) may be indepen- 
dent of the choice of optimal plan n. More precisely, let n be another stochastic optimal plan, 
then 7f(j is concentrated on for almost all o). 

Remark 1.5. In these theorems, if we assume that c is lower semi-continuous and approxi- 
mate it by the usual Lipscitz continuous functions (see ([M]) below), then we shall encounter a 
very subtle issue about the measurability of an uncountable infimum of lower semi-continuous 
functions (cf. [|6l P-VO-72] j. 

These three theorems will be proved in Section 3 by measurable selection theorem. For this 
aim, we give some necessary preliminaries in Section 2. In Section 4, we shall give a defi- 
nition of Wasserstein distance between two probability kernels and discuss the corresponding 
properties. It is hoped that the results of the present paper can be used to the study of Markov 
processes. 

2. Preliminaries 

Let be the total of all nonnegative continuous cost functions c : X x Y — > [0, oo), which is 
endowed with a metric as follows: 



m=\ 



lA sup \cY{x,y) - C2{x,y)\ 



where (xq, jq) e X x Y is fixed and 

5^(xo) := {x e X : dj,(x,x,) < m}, B'^iy^) := e Y : dY(y,yo) < m}. 
It is easy to see that d<^) is a complete metric space. Let M be defined by 

M:= \{c,fi,v) e"^ X :P(X) x P(Y) : I c(x, y)p(dx)v(dy) < +oo I . 

I JxxY J 

Then it is a metric space (maybe not complete and separable) under 

dM((ci,/ii,vi),(c2,)U2, V2)) := dc^icuci) + dp(x)(Mi,t^2) + dp(Y)(.vuV2), 
where df>(x) and dp^Y) are weak convergence metric in !P(X) and 'P(Y) respectively. We have: 
Lemma 2.1. Let {(c„,//„, y„) e M, n e N} satisfy that 

sup I Cn(x,y)pin(dx)Vn{dy) < M. 

Assume that (Cn,i^n, v„) converges to (c,p, v) in M. Then 

c(x, y)jj.{dx)v(dy) < M. 
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/ 

JXxY 



Proof. By Urysohn's lemma, there exist continuous functions f^-.X^ [0, 1] and : 
[0, 1] such that 

f^(x) = 1, xe 5™(xo), f^(x) = 0, xi Bl^\xo) 

and 

f^{y) = \, yeB'^iyo), f^'(y) = 0, yiB^/'{y^). 
Thus, by the monotone convergence theorem, we have 

I c{x,y)n{dx)v{dy) = lim ( c{x,y) A m ■ f^(x)f^iyMdx)v{dy) 

JxxY JxxY 



lim r 



/XxY 

Since c„ — > c in ^, we have 



= lim lim I c(x,y) A m ■ f^(x)fY(y)ldn(dx)Vn(dy). 

m— >oo r 



lim sup \c{x,y) - Cn{x,y)\ = 0. 

Hence, 



"^'"(xj)eB5+'(-vo)xB5+'Cvo) 



I c(x,3;)yu(dx)y(d3;) = lim lim ( Cn(x,y) A m • f^ix)fY(y)Hn(dx)v„(dy) < M. 

JxxY '"^'^ "^"^ JXxY 

The proof is complete. □ 

We recall the following definitions of cyclical monotonicity and c-convexity (cf. (61 Defini- 
tions 5.1, 5.2]). 

Definition 2.2. Let X, Y be two arbitrary set and c : X x Y ^ (-oo, oo] be a function. A 
subset r c X X Y zi' said to be c-cyclically monotone if for any N & 'R and any family 
{Xi,yi), • • • , {xN^y^) of points in Y, the following inequality holds: 

N N 

^c(Xi,yi) < ^c(Xi,yi+i), yN+i =yi. 

i=l !=1 

A function i/r : X ^ (-oo, is said to be c-convex if it is not identically +oo, and there exists 
^ : Y ^ [-00, +oo] such that 

if/{x) = sup(^(3;) - c{x,y)), Vx G X. 

yeY 

Then its c-transform is defined by 

ilf'iy) := inf(iA(x) + c{x,y)\ e Y, 

xeK 

and its c-subdifferential defined by 

dciff := [{x,y) e X X Y : ijf'iy) - i/,(x) = c(x,y)} 

is a c-cyclically monotone set. 

We first prove the following slight extension of [5, Theorem 3] and [6, Theorem 5.20]. 

Theorem 2.3. Assume that {c„,fj.n,v„) — > (c,//, v) in M. Let n,, be an optimal transference 
plan for problem ([7]) associated with Cn,Hn, v„. Then there exists a subsequence still denoted 
by n such that Tin weakly converges to some n e liijx, v) and n is an optimal transference plan 
associated with c,}j, v. 
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Proof. First of all, by [6, Lemma 4.4], (7r„)neN is tight, and so there exists a subsequence still 
denoted by n weakly converging to some n e HijJ., v). 

By |l6l Theorem 5.10], 7r„ is concentrated on some c„-cyclically monotone set F,,. For A'^ G N, 
let Cn{N) c (X X YT^ be defined by 

N N 
1=1 i=l 

where (x^ydf^^ e (X x Y)^^. Then <^ is concentrated on F^^ c CniN). 
For any £ 6 [0, 1], let ^(A^) c (X x Y)®^ be defined by 

N N 

^ c(Xi,yi) < ^ c(.x:;,);i+i) + s, yN+i = yu 

i=\ 1=1 

where (x,-, e (X x Y)®^. Since c„ ^ c in ^, for any s e (0, 1] and A^, m e N, there exists a 
no G N such that for all n ^ hq 

Cn(N) n (B'^,(x,) X 5!?(yo))^^ c cm n (5-(^o) x 5?(yo))^^ =: A'^iN). 
Since c is continuous, A'"{N) is closed. Hence, 

n^'^iA'^iN)) > Jh^ nfiA^iN)) > ^ nfiCniN) n {B'^(xo) x 5?(jo))^''). 
In view that ;7r®'^ is concentrated on C„(A^), by letting e J, 0, we further have 



N 



/r^'^CA^CA^)) > lim[;r„(5S(xo) x 5?(yo))]'' > 



1 - limOu„((5^(^o))^') + yn{(B'i(yo)r)) 



(9) 



Noticing that (Mn)neN and (v„)„gN are tight, we have 

lim supfinmxo))! = 0, lim sup v„((B?(jo))') = 0. 

Therefore, letting m — > oo for both sides of we obtain that 

n^^(Co(N)) = 1, VA^eN, 

which leads to 

(support of nf^ = support of ;r®^ c Co(.N), \/N e N, 
So the support of n is c-cyclically monotone. Since (c,/z, v) e M, we have 

C'^'''\c,iu,v)^ [ c(x,yMdx)v(dy) < +00. 

JxxY 

By [|6l Theorem 5.10] again, n is an optimal transference plan associated with c,/i, v. □ 

The following lemma will be used in the proof of Theorem 1 1.31 
Lemma 2.4. Assume that Cfo(X x Y) 9 c„ t c the sense of pointwise. Then 

C^^'^'iciu, v) < lim C'^'''\cn,iu, v). 

Proof. Without loss of generality, we assume that 

a := \jmC'^^'^\cn,iu,v) < +oo. 

(1— »oo 

In particular, there exists a subsequence still denoted by n such that 

lim C'^'''\cn,iu,v) = a. 



Let n„ e !!(//, v) be the optimal transference plan associated with Cn,i-i,v. Since !!(//, v) is 
weakly compact, there exists another subsequence n^. such that weakly converges to some 
ttq e y). By the monotonicity of c„, we have for each m e N, 



I c,„(x,y)7To(dx,dy) = lim ( 

JxxY Jx 



c,„(x,y)7To(dx,dy) = lim c,„(x,y)nn,(dx,dy) 

' JxxY 

< lim I Cnt(x,y)nn,(dx,dy) 

JxxY 

On the other hand, by the monotone convergence theorem, we have 

C'^^^''\c,i^,v)< f c{x,y)nQ{dx,dy) = lim f cUx,y)nQ(dx,dy). 
The result now follows. 



XxY ^XxY 



We also recall the following measurability theorem for multifunctions (cf. JZl or [[3l p. 26, 
Theorem 2.3]). 

Theorem 2.5. Let (W, W) be a measurable space and X a Polish space. Let X : W ^ 'F be a 

multifunctions, where T is the total of all closed sets in X. Consider the following statements: 

(1) for any closed A c X. 

{w : Xiw) nAi^(d}eW; 

(2) for any open set A c X 

{w : X{w) n A ^ 0} 6 ; 

(3) there exists a sequence (^n)neti of measurable selections ofX such that for each w £ W 



Xiw) = {Uw),neN]. 

Then it holds that (1 )^(2)<^(3). 

The following lemma is useful. 

Lemma 2.6. The Borel cr-field ^(^(X)) coincides with the cr-field generated by the mapping 
p p(B), where B e ^(X). 

Proof. Let F be a closed set in X. Define 

1 



fn{x) 



(l+dx(x,F))" 
Then i If(x). So, for any re [0, 1] 

{p 6 PQl) : p(F) <r} = U„^j,{p e P(X) : p(f„) < r} e mnM- 

The result now follows by a monotone class argument. 
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3. Proofs of Main Theorems 

In this section we give the proofs of Theorems 11.11 [L3] and [L4l First, we prove Theorem ll.il 

Proof of Theorem [Ol Define a multi- valued map: 

M 3 ic/u, v) ^ 0(c,yu, v) C !P(X X Y), 

where 0(c,yu, v) is the total of all optimal transference plan associated with c,/i, v. 

By Theorem |2.3l for each (c,ju, v) e M, 0(c,//, v) is a nonempty compact subset of f (X x Y), 
and for any closed set A c ^(X x Y) 

{(c, Id, v) 6 : 0(c, v) n A ^ 0} is a closed subset of M, 

where := [(c,fi, v) 6 M : J^^^ c(x,y)/d(dx)v(dy) < m|. Indeed, let {Cn,iJ.„, v„) e M„, converge 
to (c,/i, v). By Lemma [2H. we have (c,/i, v) 6 M„. Let jin e 0(c„,//„, v„) weakly converge to 
some n e UQi, v). By Theorem l23l n e 0(c,//, v). Since A is closed, n also belongs to A. 
Note that 

{(Cyu, v) e M : (D(c,yu, v) n A ^ 0} = U„,eN{(c,;U, y) 6 : 0(c,//, v) n A ^ 0}. 

By Theorem [231 there exists a ^(M)/M(P(X x Y))-measurable selection (c,//, v) i-> n(c,iu, v) 
such that for each (c,//, v) e M 

n(c,fX, V) 6 0(c,yU, V) C n(yU, v). 

We now define 

Since a> i-> {c{<x>),ix^, v^) is ^/e^(M)-measurable by Lemma [Z61 we thus have 

0) ^ is ^/^(r(X x Y))-measurable. (10) 

In particular, 

o) I c((o, X, y)nT(dx, dy) = C'^^^'^'icia)), vj 

JxxY 

is ^-measurable and 

The opposite inequality is clear. Thus, we complete the proof of Q and dV]). □ 

We now prove Theorem 1 1.3 1 

Proof of Theorem [731 We divide the proof into three steps. 
(Step 1): First of all, for any n e 9C{p., v), we have 



sup E l (p{ci>,y)v^{dy) - [ (A(w, x)//^(dx) 

sup E l ((p(a),y) -il/(a),x))n^(dx,dy)\ 

')xLi(v,„xP);<4-iA<c \JXxY / 



<E\ f c(oj,x,y)n^(dx,dy)\. (11) 



XxY 



Thus, we obtain one side inequality: 

sup e( r (f>(co,y)v^{dy) - [ i/f(io,x)fi^{dx)] < a'°'\c,fi,v). 
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(Step 2): In this step, we assume that c(co, x,y) is bounded and Lipschitz continuous in {x,y) 
for each oj. 

Let n'^^ be the stochastic optimal transference plan constructed in Theorem 11.11 Let be 
the support of n'^^, a c(6[;)-cyclically monotone set. Note that for any open set A c X x Y, 

{a; : n A ^ 0} = {a; : n^{A) > 0} e ^. 

By Theorem |2. 5 [ there exists a sequence (^n(oj), ?7n(ii>))neN of measurable selections of T^^ such 
that for each to e Q. 



r^ = {iUco),T]n(co)),neN}. (12) 
Define for each {oj, x) e Q x X, 

i//(a>,x) := sup sup \[c{a>,^i(cjL)),rii{a>)) - c{a>, Xi,r]i{a)))] 

meN (xuyiX- ,(x„,y„,)er<j 

+[cia),xuyi) - cia),X2,yi)] + ■■■ + [c(a), Xni,y,n) - c(a), x,yni)]y (13) 
Arguing as in ^ p.65, Step 3], we know that 

i//(a), ^i(co),T] i(a))) = 

and 

if/(a)) is c(a») -convex. 
Since c(oj, x, y) is continuous with respect to {x, y), by (fT2l) we may write 



il/{(X),x) = sup sup \{c{(X),^i{(jj),rii{oS)) - c{oj,xi,rii{io)y\ 

meN (xuyi),- ,(Xm,ym)<^{(Sn(pJ),rin(Mi)),niM\ 

+{cioj,xuy\) - c{oj,X2,y\)] + ■■■ + {c{oj,Xm,y,n) - c{oj,x,y,n)\]. (14) 

Hence, for each x e X, 6t» ^{co, x) is ^-measurable. Moreover, since c is Lipschitz continu- 
ous in {x,y), it is easy to see that for each oj e O., x i/r(a), x) is also Lipschitz continuous. Let 
i/f' (6o», y) be the c-transform of ij/ defined by 

il/'\co, y) : = inf U^io, x) + c{co, x, y)). 

Then for each j G Y, i-^ i/^' (a»,y) is also .^-measurable, and for each a» 6 Q, j i-^ if/^{(x),y) is 
Lipschitz continuous. Since c is bounded, as in [,6, p. 66, Step 4], ifj'^ and if/ are bounded. Note 
that (cf. [6, p.65. Step 3]) 

il/\oj,y) - tf/{co,x) = cici>,x,y) onF^. (15) 

So 

I i^''(a),y)Vcoidy) - I i^ioj, x)iJ.^(dx) = I c(a>, x,y)nTidx,dy), 

Jx Jy JxxY 

which then gives that 



C'°'\c,fx,v) = E\ riaj,y)vUdy)- i/,(aj, x)fi^{dx)\ 
\Jx Jy 

(Step 3): For general c(a;, x,y), define for n e N 

Cn{cL>,x,y) := inf \mm(c(oj, x' ,y'),n) + n[dx{x,x') + dY(y,y')]\. (16) 

{x',y')eXxY ^ ' 

It is easy to see that c,, is Lipschitz continuous, and 

Cn{co,x,y) < mm{c{(x),x,y),n) 
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and for each (a),x,y) e QxXxY 

Cn(.oj,x,y) T c(co,x,y) n ^ oo. 
Thus, by (|71), Lemma [Z4l and Fatou's lemma, we have 

\«— >oo / 

n— ►oo 

= limE I ^„(aj,y)Voj(.dy) - I (^^(a;, x)//^(dx) , (17) 

where 0„ = lAJJ 6 L/jc^(Y) and i/^„ 6 Lip'^iX) constructed in Step 2 satisfy 

^^C^^y) - ^n{oj,x) < c„(a;,x,j) < c{oj,x,y). (18) 
The proof is thus complete by combining with Step 1. □ 
Lastly, we prove Theorem 1 1.41 

Proof of Theorem lL4\ (a)=>(b): Let n e '7C(//, v) be a stochastic optimal transference plan, and 
let (0„, if/n)neN bc as in (fry]) . By (fTTl) and (fTTl) . we have 



lim E I [c(6c», x, y) - ^n(.co, y) + if/n(co, x)]7T^{dx, dy) I = 0. 

"^^ \JxxY / 

If necessary, by extracting a subsequence and by (fTSl ). there is an Qq 6 ^ with /'(fio) = 1 such 
that for each co e Qq, 



lim f 



[c((Xi, jc,};) - 0„(a;, j) + i/fn(co, x)]n^idx, dy) = 0. 



Fix such an co. Up to choosing a subsequence (possibly depending on co), we can assume that 
for TT^^-almost all (x, y) eXxY, 

lim ^n(oj,y) - iA„(aj, x) = c(co, x,y). 

n— >oo 

For e N, by passing to the limit in the inequality 

N N N 

^ cia),Xi,yi+i) > Y^[4>nioj,yi+i) - lf/nia),Xi)] = '^[(f>nico,yi) - lf/niC0,Xi)], 

i=l i=l (=1 

we find that n'^^ is concentrated on the closed set 

{N N \ 

ixi,ydli e (X X Y)®^ : |] c(oj, X;,y,-+i) > |] c(aj, Xi,yd\ . 

So the support of is c(a»)-cyclically monotone. 

(b)=>(c): Fix n e 'Kip., v) and set := supp(;r^^). Since we can redefine tt on a P-negligible 
set, without loss of generality, we can assume that for all a; e Q, F^^ is c(a))-cyclically monotone. 
Define a c(ct»)-convex function i/^(a>, x) as in (fTSl) in terms of F^^. From (fT4l) . we know that ijj is 
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an ^ X ^(X) -measurable function and for each oj, x il/(co, x) is lower semicontinuous. Let 
be the c(a») -transform of ij/(a>), i.e., 

if/''(oj,y) := inf Uico, x) + c(oj, x,y)). 

Since (/^^ is the infimum of uncountably many measurable functions, it is not known whether 
i/^'' is ^ X e^(Y)-measurable. As in [1, p. 133, Step 2] or [6, p. 72], we can modify i/r' on a 
V(^(dj)/'(da;)-negligible set so that it becomes measurable. First, we disintegrate nj^d^x, dj)P(da)) 
as nJ^d^x\y)v^^{d^y)P{di(jS) and define an ^ x =^(Y)-measurable function 



^{(x>,y) := I \_^{io,x) + c{u},x,yy\ ■ \f^j^x,y)TiJdix\y). 
Jx 

Since n^(r^) = 1 and c dcil/{co) (see (fT5l)). there exists a measurable set A 6 ^ x ^(Y) with 
^ v^{diy)P{dL(ji)) = 1 such that for all {co^y) e A, 

4>{a),y) = if/'(a),y) I lf-^p,y)n^(dx\y) = if^'Xa),y). 
Jx 

Let us define an ^ x ^(Y)-measurable function by 

4>{co, y) = ip'^io), y), {co, y) e A; 

- oo, i A. 

Then, it is easy to check that (0, ij/) has the desired properties. 

(c)=>(a): Arguing as in [5, Theorem 2] or [6, p. 72, (d)=>(a)], we can prove it by a truncation 
argument. 

Moreover, let n be another stochastic optimal plan, as in |l6l p.73, (a)=>(e)], we can prove that 



:= 



JxxY 



\c{(x), x,y) - <p{(X),y) + il/io), x)]n^idx, dy) = 0. 

Hence, for almost all a», is concentrated on 

:= {{x,y) eXxY : (p(a),y) - i^{oj, x) = c(co,x,y)}. 
The whole proof is finished. □ 

4. Wasserstein Metric between Two Probability Kernels 

In this section, we define the Wasserstein metric in the space of all probability kernels and 
discuss its properties. Let (X, dx) be a metric space. For p > 1, let ^(X) be the space of all 
probability kernels from Q to X with 



Jx 



dx(x, xoYf^ojidx) < +00 
for some xq e X (hence for all xq e X). Let us define for /i, v e J^^X) 

^Vp(M,v):=\ inf eT dx(x,yrnddx,dy)] 

\ne'K(p,v) JxxX I 

which is called Wasserstein distance. By Theorem ll.il we have 

^p(ju, V) = (EWpiMa., vjpf" , (19) 

where Wp(jii^, v^) = C''''"^''(d^,//^, VcjY^'^ is the usual Wasserstein distance between probability 
measures /Uoj and Vcj. 
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The following result is a direct consequence of (fT9l) and [6, Theorem 6.18]. 

Theorem 4.1. Let (X, dx) be a complete and separable metric space, and (Q, ^ , P) a separable 
probability space. Then for any p > I, {J(fpQl),'Wp) is also a complete and separable metric 
space. 

We now consider the case of = 1. In this case, Wasserstein distance is usually called 
Kantorovich-Rubinstein distance. We have: 

Theorem 4.2. For any p,v e J^i{X), 

'Wi(p,v)= sup E l if/{co, x)v^{dx) - I i//(co, x)^i^(dx)\ , 

where 

11 \llj{iO,x)-ll/{LO,x')\ 

mco)\\up := sup . 

x.jc'eX y*x\x, X ) 



Proof. By Theorem 1 1.3 1 it only needs to prove that 



sup E (f)(co,y)v^idy) - \ if/iio, x)p^idx)] (20) 

(iA,0)eL!>^(X)xLip^(X);0-i/rs;dx \Jy 



= sup 



: I J (J/ico, x)Vaj(dx) - J tf/ico, x)iUaj(dx)j . (21) 



Assume that ^{a>,y) - il/{oj, x) < dx(.^,>'). Then 



and 



cl)ito,y) < inf((/f(a;, x) + dxix,y)) =: il/^ito,y) 

xeX 



i//ia),x) > sup(i//'^(a),y) - dxix,y)) =: i//'^^ia),x). 



Thus, 

m< sup E| r i^\aj,y)vjdy) - f i^^\oj, x)p^(dx) 

i//eLip'^(X) \Jy Jx 

On the other hand, it is easy to verify 

U\aj)\\np < 1, 

and so, 

il/^{a),x) = if/^^(cjL),x). 

Hence, (|201)< (|2T]) . Moreover, (|20l) > ((2T)) is obvious. The proof is complete. □ 
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